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ABSTRACT 

Bacterial genomes encode a plethora of small 
RNAs (sRNAs), which are heterogeneous in size, 
structure and function. Most sRNAs act as post- 
transcriptional regulators by means of specific 
base pairing interactions with the 5 -untranslated 
region of mRNA transcripts, thereby modifying the 
stability of the target transcript and/or its ability to 
be translated. Here, we present RNApredator, a 
web server for the prediction of sRNA targets. The 
user can choose from a set of over 2155 genomes 
and plasmids from 1183 bacterial species. 
RNApredator then uses a dynamic programming 
approach, RNAplex, to compute putative targets. 
Compared to web servers with a similar task, 
RNApredator takes the accessibility of the target 
during the target search into account, improving 
the specificity of the predictions. Furthermore, en- 
richment in Gene Ontology terms, cellular pathways 
as well as changes in accessibilities along the target 
sequence can be done in fully automated post- 
processing steps. The predictive performance of 
the underlying dynamic programming approach 
RNAplex is similar to that of more complex 
methods, but needs at least three orders of magni- 
tude less time to complete. RNApredator is avail- 
able at http://rna.tbi.univie.ac.at/RNApredator. 

INTRODUCTION 

Bacterial small RNAs (sRNAs) are very heterogeneous in 
size, structure and function (1). Despite notable 



exceptions, most sRNAs act as post-transcriptional regu- 
lators by interacting with the 5'-untranslated region of 
mRNA transcripts (2). Similar to miRNAs in eukaryotes, 
sRNAs may target more than one mRNA and, conversely, 
a mRNA may be targeted by more than one sRNA. In 
contrast to miRNAs, however, sRNAs may cause both 
down- and upregulation of its target (3-5). This effect 
depends on the exact location of the interaction region 
and its effect on the structure of the target mRNA. 

Many approaches have been developed to find sRNA 
targets. BLAST was successfully used to identify targets 
for micC (6) and istR-1 (7). TargetRNA (8,9) implements 
a Smith-Waterman (10) recursion scoring the base pairing 
potential of two RNAs. A slightly more complex model is 
used by Mandin et al. (11), where base pair stacks are 
scored according to the standard RNA folding energy 
model (12,13) and bulge penalties are optimized so that 
known interactions rank high. 

More general approaches to describe RNA-RNA inter- 
actions based on the RNA folding energy model and 
consider the target site accessibility, like intaRNA (14), 
RNAup (15,16) or biRNA (17) greatly improved sRNA- 
target predictions at the cost of an increased computation 
time. 

In this contribution, we present RNApredator, a web 
server dedicated to the genome-wide prediction of sRNA 
targets in bacterial genomes. The main machinery used by 
RNApredator is RNAplex (18,28), a new approach for 
RNA-RNA interaction search, which has a prediction 
accuracy similar to that of algorithms that explicitly 
consider intramolecular structures, but running at least 
three orders of magnitude faster than RNAup or 
IntaRNA. In addition to the improved run time, 
RNApredator offers the user a graphical overview of 
the accessibility around the target ribosomal binding 
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Figure 1. Results page. After the target search is completed, RNApredator presents the list of the 100 best interactions. Each line contains the 
rank, total energy of interaction, corresponding Z-score, duplex structure in dot-bracket format, interaction coordinates on the sRNA and mRNA, 
gene annotation, locus tag, strand, genomic coordinates of the target, NCBI accession number as well as the type of replicon the target is located on. 
Results can be filtered based on the coordinates of the target locations (for up to 500 interactions). Moreover, it is possible to limit the displayed 
interactions to the 25, 50, 75, 100, 500 best interactions. Finally, the complete results table can be downloaded in .csv or raw RNAplex format. 



sites upon sRNA binding, as well as a Gene Ontology 
enrichment analysis for a set of user selected gene of 
interest. 



DESCRIPTION OF THE WEBSERVER 

Functionality of RNApredator 

For all annotated mRNAs in the selected target sequences, 
RNApredator computes several relevant interaction 
characteristics by launching RNAplex. Thanks to its 
ability of considering target accessibility, RNAplex 
reaches 

prediction accuracies similar to more complex and 
computationally much more demanding methods, while 
being at least three orders of magnitude faster than al- 
ternative methods considering target site accessibility 
(see Supplementary Data for more information). 
RNApredator is thus applicable to genome-wide sRNA 
target prediction. 

After completing the computation of all candidate 
sRNA-target interactions, RNApredator returns a list 
of target sites sorted by the energy of interaction. In 
addititon, an enrichmenent analysis of GO terms is per- 
formed for all or a user-defined subset of the predicted 
interactions. 

Furthermore, the influence of sRNA binding to its 
target on the accessibility of the ribosomal entry site can 
be studied with RNAup, predicting whether the sRNA will 
act as a positive or negative regulator at a particular target 
site (16). 

Other tools 

While a large number of tools are available for the pre- 
diction of miRNA targets in eukaryotes [for a review see 



(19)], comparably little effort has been invested to charac- 
terize targets of sRNA regulators. At present, the only 
web server specifically advertized for target prediction in 
prokaryotes is TargetRNA (8,9), which implements a 
modification of the Smith-Waterman (10) dynamic 
programming algorithm that assesses base pairing poten- 
tial instead of base homology. This is achieved with the 
help of a custom-tailored scoring system. Alternatively, 
TargetRNA can also be run with thermodynamic param- 
eters for RNA folding (12,13), at the expense of a run time 
increased by at least a order of magnitude (8). 

intaRNA (14,20) also allows to search sRNA-mRNAs 
duplexes with a more realistic energy model (12,13) and an 
increased specificity owing to the inclusion of target and 
query secondary structures information. It can be used to 
be employed for target search in bacterial genomes. There 
is also a web server based on RNAup (21) available. 
Thanks to its unapproximated energy model, RNAup 
allows to more precisely describe the thermodynamics of 
mRNA-sRNA interactions than with RNAplex. Still the 
high run time of RNAup as well as the inability of the 
RNAup webserver to handle more than a pair of sequences 
at a time, makes it unpractical for genome-wide target 
search. 

Input 

RNApredator takes as input a single sRNA sequence 
consisting of lower or uppercase [A,T,C,G,U] letters, 
where T is automatically converted into U. The targets 
of this sequence can be searched against the ensemble of 
plasmids/chromosomes referred by a NCBI taxonomy ID 
or a specific plasmid/chromosome referred by a NCBI ac- 
cession number. Currently, 1183 bacterial species are 
available, encompassing a total of 2155 chromosomes 
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and plasmids. Alternatively, the species of interest can be 
chosen from a taxonomic tree. 

Once the desired genome has been selected, a sRNA 
sequence should be entered in the sRNA sequence field. 
The target search is launched after the predict button has 
been pressed. Targets are searched for each annotated 
gene, including 5'- and 3'-UTR. The 5'-UTR and 
3'-UTR regions are defined as the 200 nt regions directly 
up and downstream of the coding sequence. 
subsectionOutput after submission of the sRNA 
RNApredator returns the target predictions. The 
results should be similar to the accessibility-based 
RNAplex and better than RNAplex without accessibility 
information. Still different parameters used to compute 
accessibility profiles from RNAplfold leads to different 
accessibilities and consequently to different RNAplex 
results. In case of the sRNA micA in Escherichia coli 
(NC_000913), RNApredator needs ~5min to finish the 
computation, scanning the full coding sequence and 200 nt 
upstream of the start codon. Targe tRNA needed 40 s for 
the whole genome, processing each coding sequence from 
20 nt upstream of the translation start and 30 nt down- 
stream with a seed length set to 1 and G:U pairs allowed. 

The intaRNA web server is much slower, as it takes 3 h 
to finish the computation, under the supplementary con- 
straint that for each gene only subsequences of up to 
500 nt can be searched. 

The web server outputs a table of the 100 most stable 
duplexes found by RNAplex (see Figure 1). Each line of 
the table contains the energy of interaction, i.e. the raw 
hybridization energy corrected for the opening energies on 
both the target and the sRNA sequences, the correspond- 
ing Z-score, which is useful for comparing interactions 
involving different sRNAs, the duplex structure in 
dot-bracket format, the start and end of the duplex on 
the target and query sequences, gene annotation, the 
NCBI accession number, genomic coordinates, as well as 
the type of replicon where the gene is found (chromosome/ 
plasmid). Results can be sorted by all duplex characteris- 
tics, on the exception of the hybrid structure. 

Even though most of the sRNAs act in vicinity of the 
5'-end of the target RNA, there are growing evidences that 
sRNA may exert their effects by binding also in the coding 
sequence region (22,23). In order to concentrate on the 
region of interest, the user can filter the duplexes by 
setting a position filtering (for up to 500 interactions) on 
the target sites coordinates. Further filtering is achieved by 
limiting the number of returned duplex to 25, 50 and 75. If 
desired, the user can increase the number of displayed 
interactions to 500 or to the complete results returned 
by RNAplex. 

The left-most column allows the user to select genes of 
interest for further post-processing (Figure 2b), in particu- 
lar the analysis of the accessibility around the target site 
for the bound (green line in Figure 2c and d) and unbound 
target (red line). These accessibility profiles are computed 
with RNAup. This adds important information since many 
sRNAs regulate their targets by changing the accessibility 
of the ribosomal binding site (5,24). Therefore, the differ- 
ence in the accessibility before and after binding (black 
line), the position of the start codon (cyan vertical line) 



as well as the boundaries of the target site (blue vertical 
line) are displayed, see Figure 2c and d. In the case of the 
RprA-rpoS and DsrA-rpoS duplexes (bottom left and 
right of Figure 2), for instance, the interactions take place 
100 nt upstream of the start codon, but increase the acces- 
sibility of the region around the start codon (Figure 2c 
and d). Both interactions lead to a reduction of up to 
4kcal/mol of the opening energy around the start codon, 
leading to a strong upregulation of rpoS (5,24). 

To better apprehend the function of the sRNA of 
interest, RNApredator provides an enrichment analysis 
of GO terms in the set of selected targets. For each GO 
categories (Biological Process, Molecular Function, 
Cellular Component), the 20 highest enriched terms are 
returned in tabular format. Besides the GO-ID, annotated 
term, total number of genes linked to this GO-ID, total 
number of predicted targets linked to this GO-ID, number 
of expected linked targets as well as the P-value are 
returned. The results can be classified by any of the 
above characteristics. 

Finally, the post-processing page shows in greater 
details the relevant characteristics of the duplex (ascii 
string) and allows to download the sequences of the 
target and sRNA by following the mRNA and sRNA 
sequence link, respectively. 

Implementation details 

RNApredator was implemented in Perl 5. It uses the 
javascript library jQuery jquery.com to allow sorting of 
the results table. Computation of the accessibility 
profiles in the post-processing steps is performed with 
the help of the RNAup program. RNApredator relies 
on different databases. The bacterial genomes were down- 
loaded from NCBI ftp://ftp.ncbi.nlm.nih.gov/genomes/ 
Bacteria, while taxonomy data were retrieved from 
NCBI ftp://ftp.ncbi.nih.gov/pub/taxonomy. All available 
bacterial GO term flatfiles, which are necessary for the GO 
term enrichment analysis were downloaded from ftp://ftp 
.ebi.ac.uk/pub/databases/GO/goa/proteomes. The com- 
putation of the GO term enrichment is based upon these 
files and an R-script based on the TopGO (25) library. 

The most time consuming step in the interaction predic- 
tion is the computation of accessibilities along the bacter- 
ial genome. In order to speed up the calculation, we have 
precomputed the accessibility profiles for all genomes 
using RNAplfold (26,27). 

BENCHMARK 

RNApredator was benchmarked against Targe tRNA 
for a set of 30 interactions retrieved from the literature. 
For each experimentally confirmed interaction, the 
number of better scoring interactions was computed for 
both prediction tools. The ranking procedure only con- 
sidered interactions predicted to be located between 
position —150 and 100 and —30 and 20 relative to the 
start codon, respectively [see Table 1. 73% of the inter- 
actions (22) ranked higher in RNApredator than in 
TargetRNA]. TargetRNA was used with an hybridiza- 
tion length of 1, with allowed G:U pairs and with a 
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Figure 2. Post-processing page: detailed information about interactions selected on the results page, (a) The upper part of the three GO term class 
tables, i.e. biological process, molecular function or cellular component. The 20 most significant GO terms are shown in a separate table. Each line of 
the table contains the GO term ID, human readable term, total number of annotated genes linked to the GO term, number of targets selected linked 
to the GO term, expected number of targets linked to the GO term and P-value of the GO term enrichment. (b)The selected interactions are shown in 
detail; Relevant duplex characteristics are recapitulated and a graphical representation of the duplex structure is shown. mRNA and sRNA sequences 
can be downloaded in .fasta format. The Calculate-link enables the user to get a plot of the opening energy for all stretches of 4nt for the region 
around the start codon before (red line) and after (green line) the sRNA binding. The accessibility difference is shown in (black line), [(c) rpoS-RprA, 
(d) rpoS-DsrA]. The 5'-end of the start codon is represented with a cyan line and the interaction site with two blue lines. Recalculation of the duplex 
is possible by using the Calculate-link to RNAup web server. 



P- value threshold set to 100. The sRNA was always 
characterized as a new sequence. It should be noted that 
TargetRNA thermodynamic energy scoring was not 
able to return any result. For this reason, the bench- 
mark/hlreports only the results for the sequence-based 
energy scoring. 

The RNAup web server was not used in the benchmark 
as it is designed to give an in-depth understanding of the 
thermodynamics of a sRNA-mRNA interaction, rather 
than searching genome wide for putative targets. 
Furthermore, the important time complexity of RNAup 



algorithm impede it to return putative targets in a reason- 
able amount of time (see Supplementary Data). Still the 
users of RNApredator can use RNAup to study inter- 
actions of interest during the post-processing step of 
RNApredator. 

DISCUSSION 

RNApredator is a freely available web server that facili- 
tates the search for putative sRNA targets in bacterial 
genomes. Predictions from RNApredator reach the 
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Table 1. Summary of TargetRNA and RNApredator ranking of 30 experimentally confirmed interactions 
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The first column contains the NCBI accession ID of the species, the species name is indicated in the second column. The third and fourth columns 
contain the sRNA and mRNA gene tag, the fifth column shows the locus tag and the sixth and seventh columns contain the rank of the interaction 
for TargetRNA and RNApredator. In the last two columns, the number in parenthesis corresponds to the rank when the target search is 
constrained to a region located 30 nt upstream and 20 nt downstream of the start codon, while the other numbers correspond to the rank for the 
region spanning 150nt upstream and lOOnt downstream of the start codon. NF stands for not found (TargetRNA does not return targets with a 
rank > 100, and RNApredator hits also contain suboptimal interactions) B.s. is Bacillus subtilis subsp. subtilis str. 168, E.c.O is Escherichia coli 
0127:H6 str. E2348/69, E.c.K. is Escherichia coli str. K-12 substr. MG1655, L.m. is Listeria monocytogenes EGD-e and V.c. is Vibrio cholerae Ol 
biovar El Tor str. N16961. 



accuracy of more complex methods like RNAup, intaRNA 
or biRNA, while saving at least three orders of magnitude 
of CPU time. This allows to search for sRNA targets in 
bacterial species in a few minutes, compared to hours or 
days for IntaRNA or RNAup, respectively. 

Unique features of the RNApredator web server are 
the post-processing steps. The computation of accessibility 
changes of the target upon sRNA binding may help in 
deciding whether the target will be up- or downregulated. 
The GO term enrichment allows to further filter the 
targets in order to select genes that belong to the group 
of highly enriched terms. 
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